Search CORE

arXiv.org e-Print Archive

Linear regression for numeric symbolic variables: an ordinary least squares approach based on Wasserstein Distance

Author: A Irpino
Antonio Irpino
B Efron
CL Lawson
CL Mallows
E Diday
EAL Neto
EAL Neto
G Dall’Aglio
H Bock
J Arroyo
L Billard
L Kantorovich
L Wasserstein
M Noirhomme-Fraiture
P Bertrand
P Bickel
R Tibshirani
Rosanna Verde
WG Gilchrist
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/07/2012
Field of study

In this paper we present a linear regression model for modal symbolic data. The observed variables are histogram variables according to the definition given in the framework of Symbolic Data Analysis and the parameters of the model are estimated using the classic Least Squares method. An appropriate metric is introduced in order to measure the error between the observed and the predicted distributions. In particular, the Wasserstein distance is proposed. Some properties of such metric are exploited to predict the response variable as direct linear combination of other independent histogram variables. Measures of goodness of fit are discussed. An application on real data corroborates the proposed method

CiteSeerX

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Assortment optimisation under a general discrete choice model: A tight analysis of revenue-ordered assortments

Author: A Abeliuk
A Aouad
A Schrijver
CL Mallows
D Fudenberg
D Luce
D McFadden
F Echenique
G Aggarwal
G Berbeglia
Gerardo Berbeglia
Gwenaël Joret
HD Block
HP Young
I Méndez-Díaz
J Cardinal
J Feldman
JB Feldman
JJM Bront
K Talluri
KT Talluri
LL Thurstone
MG Kendall
P Briest
P Chalermsook
P Manzini
P Rusmevichientong
P Rusmevichientong
P Rusmevichientong
P Rusmevichientong
R Webb
RD Luce
RH Koning
S Jagabathula
SP Anderson
TB Murphy
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/02/2019
Field of study

The assortment problem in revenue management is the problem of deciding which subset of products to offer to consumers in order to maximise revenue. A simple and natural strategy is to select the best assortment out of all those that are constructed by fixing a threshold revenue

\pi

and then choosing all products with revenue at least

\pi

. This is known as the revenue-ordered assortments strategy. In this paper we study the approximation guarantees provided by revenue-ordered assortments when customers are rational in the following sense: the probability of selecting a specific product from the set being offered cannot increase if the set is enlarged. This rationality assumption, known as regularity, is satisfied by almost all discrete choice models considered in the revenue management and choice theory literature, and in particular by random utility models. The bounds we obtain are tight and improve on recent results in that direction, such as for the Mixed Multinomial Logit model by Rusmevichientong et al. (2014). An appealing feature of our analysis is its simplicity, as it relies only on the regularity condition. We also draw a connection between assortment optimisation and two pricing problems called unit demand envy-free pricing and Stackelberg minimum spanning tree: These problems can be restated as assortment problems under discrete choice models satisfying the regularity condition, and moreover revenue-ordered assortments correspond then to the well-studied uniform pricing heuristic. When specialised to that setting, the general bounds we establish for revenue-ordered assortments match and unify the best known results on uniform pricing.Comment: Minor changes following referees' comment

arXiv.org e-Print Archive

DI-fusion

Change in BMI Accurately Predicted by Social Exposure to Acquaintances

Author: A De Montis
A Kouvonen
Alex (Sandy) Pentland
Anmol Madan
AS Jackson
C Song
CL Mallows
D Brockmann
D Gallagher
G Chowell
H Akaike
I Kawachi
Inas Khayal
JH Fowler
JK Harris
L Isella
M Kivimaki
Manlio Vinciguerra
MC Gonzalez
NA Christakis
R Tibshirani
Rahman O. Oloritun
RW Jeffery
Sai Moturu
Taha B. M. J. Ouarda
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/08/2013
Field of study

Research has mostly focused on obesity and not on processes of BMI change more generally, although these may be key factors that lead to obesity. Studies have suggested that obesity is affected by social ties. However these studies used survey based data collection techniques that may be biased toward select only close friends and relatives. In this study, mobile phone sensing techniques were used to routinely capture social interaction data in an undergraduate dorm. By automating the capture of social interaction data, the limitations of self-reported social exposure data are avoided. This study attempts to understand and develop a model that best describes the change in BMI using social interaction data. We evaluated a cohort of 42 college students in a co-located university dorm, automatically captured via mobile phones and survey based health-related information. We determined the most predictive variables for change in BMI using the least absolute shrinkage and selection operator (LASSO) method. The selected variables, with gender, healthy diet category, and ability to manage stress, were used to build multiple linear regression models that estimate the effect of exposure and individual factors on change in BMI. We identified the best model using Akaike Information Criterion (AIC) and R[superscript 2]. This study found a model that explains 68% (p<0.0001) of the variation in change in BMI. The model combined social interaction data, especially from acquaintances, and personal health-related information to explain change in BMI. This is the first study taking into account both interactions with different levels of social interaction and personal health-related information. Social interactions with acquaintances accounted for more than half the variation in change in BMI. This suggests the importance of not only individual health information but also the significance of social interactions with people we are exposed to, even people we may not consider as close friends.MIT Masdar ProgramMIT Media Lab Consortiu

DSpace@MIT

The time-profile of cell growth in fission yeast: model selection criteria favoring bilinear models over exponential ones

Author: A Sveiczer
A Sveiczer
Akos Sveiczer
AL Mackay
CB Ward
CL Mallows
EI George
G Schwarz
H Akaike
H Akaike
H Miyata
HE Kubitschek
J Cullum
J Gabrielsson
JI Myung
JM Mitchison
JM Mitchison
JM Mitchison
JM Mitchison
JM Mitchison
JM Mitchison
M Stone
MA Pitt
P Buchwald
P Buchwald
Peter Buchwald
S Cooper
WD Donachie
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: There is considerable controversy concerning the exact growth profile of size parameters during the cell cycle. Linear, exponential and bilinear models are commonly considered, and the same model may not apply for all species. Selection of the most adequate model to describe a given data-set requires the use of quantitative model selection criteria, such as the partial (sequential) F-test, the Akaike information criterion and the Schwarz Bayesian information criterion, which are suitable for comparing differently parameterized models in terms of the quality and robustness of the fit but have not yet been used in cell growth-profile studies. RESULTS: Length increase data from representative individual fission yeast (Schizosaccharomyces pombe) cells measured on time-lapse films have been reanalyzed using these model selection criteria. To fit the data, an extended version of a recently introduced linearized biexponential (LinBiExp) model was developed, which makes possible a smooth, continuously differentiable transition between two linear segments and, hence, allows fully parametrized bilinear fittings. Despite relatively small differences, essentially all the quantitative selection criteria considered here indicated that the bilinear model was somewhat more adequate than the exponential model for fitting these fission yeast data. CONCLUSION: A general quantitative framework was introduced to judge the adequacy of bilinear versus exponential models in the description of growth time-profiles. For single cell growth, because of the relatively limited data-range, the statistical evidence is not strong enough to favor one model clearly over the other and to settle the bilinear versus exponential dispute. Nevertheless, for the present individual cell growth data for fission yeast, the bilinear model seems more adequate according to all metrics, especially in the case of wee1Δ cells

Springer - Publisher Connector

Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data

Author: B Efron
C Robert
CL Mallows
DJC MacKay
EI George
G Schwarz
H Akaike
I Guyon
I Hedenfalk
J Khan
J Zhu
J Zhu
JT Kwok
KE Lee
Leo Wang-Kit Cheung
M Dettling
M Yuan
N Cristianini
P Tamayo
R Tibshirani
RM Neal
S Barnett
S Ramaswamy
TR Golub
TR Golub
TV Gestel
U Alon
X Zhou
X Zhou
X Zhou
X Zhou
Xin Zhao
Y Lin
Y Lin
Publication venue: BioMed Central
Publication date: 01/02/2007
Field of study

BACKGROUND: Designing appropriate machine learning methods for identifying genes that have a significant discriminating power for disease outcomes has become more and more important for our understanding of diseases at genomic level. Although many machine learning methods have been developed and applied to the area of microarray gene expression data analysis, the majority of them are based on linear models, which however are not necessarily appropriate for the underlying connection between the target disease and its associated explanatory genes. Linear model based methods usually also bring in false positive significant features more easily. Furthermore, linear model based algorithms often involve calculating the inverse of a matrix that is possibly singular when the number of potentially important genes is relatively large. This leads to problems of numerical instability. To overcome these limitations, a few non-linear methods have recently been introduced to the area. Many of the existing non-linear methods have a couple of critical problems, the model selection problem and the model parameter tuning problem, that remain unsolved or even untouched. In general, a unified framework that allows model parameters of both linear and non-linear models to be easily tuned is always preferred in real-world applications. Kernel-induced learning methods form a class of approaches that show promising potentials to achieve this goal. RESULTS: A hierarchical statistical model named kernel-imbedded Gaussian process (KIGP) is developed under a unified Bayesian framework for binary disease classification problems using microarray gene expression data. In particular, based on a probit regression setting, an adaptive algorithm with a cascading structure is designed to find the appropriate kernel, to discover the potentially significant genes, and to make the optimal class prediction accordingly. A Gibbs sampler is built as the core of the algorithm to make Bayesian inferences. Simulation studies showed that, even without any knowledge of the underlying generative model, the KIGP performed very close to the theoretical Bayesian bound not only in the case with a linear Bayesian classifier but also in the case with a very non-linear Bayesian classifier. This sheds light on its broader usability to microarray data analysis problems, especially to those that linear methods work awkwardly. The KIGP was also applied to four published microarray datasets, and the results showed that the KIGP performed better than or at least as well as any of the referred state-of-the-art methods did in all of these cases. CONCLUSION: Mathematically built on the kernel-induced feature space concept under a Bayesian framework, the KIGP method presented in this paper provides a unified machine learning approach to explore both the linear and the possibly non-linear underlying relationship between the target features of a given binary disease classification problem and the related explanatory gene expression data. More importantly, it incorporates the model parameter tuning into the framework. The model selection problem is addressed in the form of selecting a proper kernel type. The KIGP method also gives Bayesian probabilistic predictions for disease classification. These properties and features are beneficial to most real-world applications. The algorithm is naturally robust in numerical computation. The simulation studies and the published data studies demonstrated that the proposed KIGP performs satisfactorily and consistently

Springer - Publisher Connector

Lazy Lasso for local regression

Author: A Hoerl
AS Fotheringham
B Efron
C Loader
CL Mallows
Concha Bielza
D Donoho
D Ruppert
DC Wheeler
Diego Vidaurre
DM Allen
EB Fowlkes
F Ferraty
F Ferraty
F Ferraty
GAF Seber
H Wang
H Zou
H Zou
J Barrientos-Marin
J Fan
J Fan
J Lafferty
J Ramsay
JA Khan
JP Jones
K Knight
L Breiman
L Grosenick
N Meinshausen
P Larrañaga
P Zhao
Pedro Larrañaga
R Tibshirani
RE Kass
S Ma
S Weisberg
SD Foster
SJ Devlin
T Hastie
T Hesterberg
WS Cleveland
WS Cleveland
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Locally weighted regression is a technique that predicts the response for new data items from their neighbors in the training data set, where closer data items are assigned higher weights in the prediction. However, the original method may suffer from overfitting and fail to select the relevant variables. In this paper we propose combining a regularization approach with locally weighted regression to achieve sparse models. Specifically, the lasso is a shrinkage and selection method for linear regression. We present an algorithm that embeds lasso in an iterative procedure that alternatively computes weights and performs lasso-wise regression. The algorithm is tested on three synthetic scenarios and two real data sets. Results show that the proposed method outperforms linear and local models for several kinds of scenario

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Oxford University Research Archive

Archivo Digital UPM

Training compliance control yields improvements in drawing as a function of beery scores

Author: A Myronenko
Aaron Fath
AC Lo
AD Wilson
AG Feldman
Andrew Bremner
B Smits-Engelsman
BE Studenka
BN Wilson
C Gonzalez
C von Hofsten
CD Takahashi
CJ Daly
CL Mallows
DI McCloskey
DJ Reinkensmeyer
E Thelen
G Kwakkel
G Montagne
Geoffrey P. Bingham
GM Goodwin
H Ben-Pazi
H Cornhill
H Van Waelvelde
IAM Beets
Ian Flatters
J Bastin
J Bo
JC Dessing
JD Wong
JK Nelson
KM Deutsch
KM Newell
L Marchal-Crespo
Mark Mon-Williams
MJM Volman
ML Kaiser
MR Wild
MT Turvey
N Getchell
PR Culmer
PR Davidson
S Rodger
W Snapp-Childs
W Snapp-Childs
Winona Snapp-Childs
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Many children have difficulty producing movements well enough to improve in sensori-motor learning. Previously, we developed a training method that supports active movement generation to allow improvement at a 3D tracing task requiring good compliance control. Here, we tested 7–8 year old children from several 2nd grade classrooms to determine whether 3D tracing performance could be predicted using the Beery VMI. We also examined whether 3D tracing training lead to improvements in drawing. Baseline testing included Beery, a drawing task on a tablet computer, and 3D tracing. We found that baseline performance in 3D tracing and drawing co-varied with the visual perception (VP) component of the Beery. Differences in 3D tracing between children scoring low versus high on the Beery VP replicated differences previously found between children with and without motor impairments, as did post-training performance that eliminated these differences. Drawing improved as a result of training in the 3D tracing task. The training method improved drawing and reduced differences predicted by Beery scores

CiteSeerX

Public Library of Science (PLOS)

White Rose Research Online

An Information Theory Approach to Hypothesis Testing in Criminological Research

Author: American Community Survey
B Feldmeyer
C Flather
CJ Sullivan
CL Mallows
CP Haberman
CR Rao
D Karlis
D Steffensmeier
D Weisburd
D Weisburd
DJH Sleep
DN McCloskey
DR Anderson
E Groff
E Groff
G Schwarz
GA Petrossian
H Akaike
J Ripplinger
K Burnham
K Takeuchi
KP Burnham
KP Burnham
L Zhu
LZ Garamszegi
M Livingston
MD Maltz
MJ Mazerolle
MRE Symonds
P Wilcox
PJ Gruenewald
PL Brantingham
R Berk
RJ Steidl
RL Block
RL Wasserstein
S Johnson
SD Bushway
ST Ziliak
SV Yu
TF Fondell
W Baumol
W Bernasco
W Bernasco
WA Pridemore
Y Lee
Publication venue: CUNY Academic Works
Publication date: 01/01/2018
Field of study

Background: This research demonstrates how the Akaike information criterion (AIC) can be an alternative to null hypothesis significance testing in selecting best fitting models. It presents an example to illustrate how AIC can be used in this way. Methods: Using data from Milwaukee, Wisconsin, we test models of place-based predictor variables on street robbery and commercial robbery. We build models to balance explanatory power and parsimony. Measures include the presence of different kinds of businesses, together with selected age groups and social disadvantage. Results: Models including place-based measures of land use emerged as the best models among the set of tested models. These were superior to models that included measures of age and socioeconomic status. The best models for commercial and street robbery include three measures of ordinary businesses, liquor stores, and spatial lag. Conclusions: Models based on information theory offer a useful alternative to significance testing when a strong theoretical framework guides the selection of model sets. Theoretically relevant ‘ordinary businesses’ have a greater influence on robbery than socioeconomic variables and most measures of discretionary businesses

City University of New York